Forum:Help: Delphi programmers
I thought I'd join the U6Edit project so I could give it a feature wherein it we could do a global search for text strings. At the time I didn't realize that this project was written in Delphi, so clearly this is not going to work out. I put this in as a feature request, but since the project has been dead for years, I expect no one will read it. So I guess the only thing left is for me to make my own tool that will do a global search. I thought I might get some help looking at the U6Edit code, to learn how to hack into the transcripts, but some help from someone who knows Delphi might speed things along. Here's what I've learned so far: *Transcripts ("conversations") are compressed LZW-style *Transcripts are entirely in files CONVERSE.A and CONVERSE.B *There some cryptic talk about the file format at the end of some documentation but I'm not clear on everything it says AngusM 21:56, September 6, 2010 (UTC) :I have had to program in Delphi several times in the past year. I don't like it at all, but I think I can work my way around it. I'll try and check it out, probably next weekend.--Sega381 03:14, September 7, 2010 (UTC) ::That'd be awesome! I've been trying to get into CONVERSE.A, and see if I can decompress anything. The availability of LZW C APIs is shockingly limited, and I've given up. Right now my only hope is for you to hack the U6Edit code to figure out how they get into it. ::From what I can tell this class (or whatever you Delphi guys call it) "IConversation" holds the answer. I'm looking at this particular code here: 16 type 17 IConversation = interface 18 '{40FC70DB-1ED0-42B1-8B2E-23DA0CC63EBD}' 19 procedure GetText(Lines: TStrings; out AName, ADescription: string; 20 ShowOpcodes: Boolean); ::From what i can tell if you can crack GetText(), you might find what we are looking for. ::It looks like there're two things we want: "conversations" mostly importantly, and then "literature". AngusM 04:13, September 7, 2010 (UTC) :::I don't remember the details but I used Nuvie code and code from u6decode source to decompress the U6 voice files. I should post my code somewhere... -- Fenyx4 17:26, September 7, 2010 (UTC) ::::Yes, someone at the U6Edit project put me onto Nuvie. I was looking at it last night. It's C++ and on an SVN server! I can relate to both. ::::From what I read, the conversations have some distinction in their formatting from other compressed material. So what you did might have to be tweaked before it can be used to decompress conversations. It all had my head spinning last night. AngusM 03:21, September 8, 2010 (UTC) :::::Just to be clear, what is exactly your goal here? To modify U6edit to add the functionailty to search? To understand how the files are compressed? To extract the whole text? To create another utility? Just to go in the correct track...--Sega381 01:59, September 9, 2010 (UTC) ::::::I'd prefer that this function be made available in U6Edit, because we would also be using its other functions for research. However I can't do Delphi, and I doubt the projecteers are prepared to implement it. So, this would have to be a different tool. For that, I'd have to first know the format for the "conversations" and "literature". I expect once this bizarre enigma of LZW compression is worked out, the rest would be simple. AngusM 02:30, September 9, 2010 (UTC) :::::::Ok, I think I understand now the format of the converse.* files. I'll try and implement a small utility to extract things to see if I understood correctly, and I'll let you know how it goes.--Sega381 17:11, September 11, 2010 (UTC) ::::::::Ok. So, the details about the converse.* file format are here, if someone needs to see the complete description. In summary, converse.* files are not LZW-compressed, but contain several conversation "blocks", and each block is LZW compressed. The compressed conversation blocks are preceded by a single uncompressed block which contains an index to the offsets of where each conversation block starts. ::::::::I took the u6decode program and heavily modified it to allow for converse.* files (or lib32 files as the doc calls them) to be decompressed, and it seems to have worked. I've uploaded the raw uncompressed conversations for both .A and .B in plain text format (with a small "Conversation N:" prefix before each new conversation inside the file), as well as the source I used and the Win32 console binary here. There was one conversation in Converse.B which I couldn't decompress, as it appeared to be an invalid LZW block. I haven't started checking what the special characters between words in each conversation mean. Hope this helps! -- Sega381 20:24, September 12, 2010 (UTC) :::::::::Good man! Yes, my problem w/converse.? is that the 3rd and 4th bytes in each block were 0s, and it made me feel like the first 2 bytes were important. It wasn't clear if each block was simply an LZW block, or if more had to be interpreted from it. A bigger problem is that I couldn't find an open source library to uncompress LZW blocks! :::::::::We can worry about that funky block later. For now I have the code to make this utility, so that's what I'm going to do. Good work. AngusM 01:26, September 13, 2010 (UTC) ::::::::::You're welcome. If I understand correctly, the first 4 bytes of any LZW-compressed block or file are actually not compressed and indicate the decompressed file size (the first byte is the lower part of the size number, and then the following bytes have to be shifted to make the full number). After those four bytes the actual compressed part starts. As the conversation blocks are fairly small, it's normal for the 3rd and 4th bytes to be 0s, as the small sizes of the decompressed blocks don't need more than 2 bytes (the first two) to be described. ::::::::::About the lib, fortunately the u6decode utility included a LZW decoding implementation, which was practically ready to be used (apart from some fiddling to adapt the code to be used for virtual blocks instead of full files). Including that weird block in .b, there are 97 conversation blocks in converse.a, and 103 conversation blocks in converse.b (plus several invalid offets), for a total of 200 conversations (which means I was able to extract 199 conversations). Does that sound like the number of NPCs in the game? ::::::::::Btw, the code I uploaded is no work of art, it's pretty crappy in some places; I tried to make it a little structured, but I didn't have much time to refactor everything. If you need clarification on something, or if you start getting frustrated with it, just let me know. Good luck!--Sega381 02:04, September 13, 2010 (UTC) :::::::::::I've just found out about the weird block in converse.b. For some reason (maybe it was too short to be compressed?), the conversation block 27 (or 26 if your counting from 0) contained a value of "0" in its first four bytes (the uncompressed size), and uncompressed data in the rest of the block (not LZW compressed as the rest of the blocks). So it just needed to be read and write without decompression. I've uploaded a 2.1 version that supports this kind of weird blocks, to the same drop.io, in case you're interested. Therefore, the converseb.txt included has the complete uncompressed conversations from converse.b.--Sega381 02:59, September 13, 2010 (UTC) ::::::::::::So... what are you saying here? That that broken block isn't really a block at all, and doesn't contain anything an NPC said? From what I'm reading, it sounds like it's playing the role of a null terminator. AngusM 02:41, September 14, 2010 (UTC) :::::::::::::I believe he's saying that if something has a compression size of 0 then the text simply isn't compressed. So it was never broken. Just a special case his code didn't account for previously. -- Fenyx4 03:26, September 14, 2010 (UTC) ::::::::::::::Yep, what Fenyx said :). And I meant that the NEW version of conversb.txt that I uploaded after the fix has now the full conversations form CONVERSE.B, as the uncompressed conversation in block 27 has now been added.--Sega381 18:36, September 14, 2010 (UTC)